Speech systems are sensitive to accent variations. This is especially challenging in the Indian context, with an abundance of languages but a dearth of linguistic studies characterising pronunciation variations. The growing number of L2 English speakers in India reinforces the need to study accents and L1-L2 interactions. We investigate the accents of Indian English (IE) speakers and report in detail our observations, both specific and common to all regions. In particular, we observe the phonemic variations and phonotactics occurring in the speakers' native languages and apply this to their English pronunciations. We demonstrate the influence of 18 Indian languages on IE by comparing the native language pronunciations with IE pronunciations obtained jointly from existing literature studies and phonetically annotated speech of 80 speakers. Consequently, we are able to validate the intuitions of Indian language influences on IE pronunciations by justifying pronunciation rules from the perspective of Indian language phonology. We obtain a comprehensive description in terms of universal and region-specific characteristics of IE, which facilitates accent conversion and adaptation of existing ASR and TTS systems to different Indian accents.
translated by 谷歌翻译
Analysis of Indian English (IE) pronunciation variabilities are useful in building systems for Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) synthesis in the Indian context. Typically, these pronunciation variabilities have been explored by comparing IE pronunciation with Received Pronunciation (RP). However, to explore these variabilities, it is required to have labelled pronunciation data at the phonetic level, which is scarce for IE. Moreover, versatility of IE stems from the influence of a large diversity of the speakers' mother tongues and demographic region differences. Prior linguistic works have characterised features of IE variabilities qualitatively by reporting phonetic rules that represent such variations relative to RP. The qualitative descriptions often lack quantitative descriptors and data-driven analysis of diverse IE pronunciation data to characterise IE on the phonetic level. To address these issues, in this work, we consider a corpus, Indic TIMIT, containing a large set of IE varieties from 80 speakers from various regions of India. We present an analysis to obtain the new set of phonetic rules representing IE pronunciation variabilities relative to RP in a data-driven manner. We do this using 15,974 phonetic transcriptions, of which 13,632 were obtained manually in addition to those part of the corpus. Furthermore, we validate the rules obtained from the analysis against the existing phonetic rules to identify the relevance of the obtained phonetic rules and test the efficacy of Grapheme-to-Phoneme (G2P) conversion developed based on the obtained rules considering Phoneme Error Rate (PER) as the metric for performance.
translated by 谷歌翻译
在这项研究中,要求各种印度生物的听众倾听并认识到美国扬声器所说的速度话语。我们识别出一个话语时,我们有三种来自每个听众的回应:1。句子难度评级,2.扬声器难度评级,以及讲话的转录。从这些转录中,计算并用作标准以评估识别和原始句子之间的相似性。本研究中选择的句子分为三组:简单,中和硬,基于此研究它们中的单词的频率。我们观察到句子,扬声器难度评级和行动从易于难以句子的句子增加。我们还使用以下三种自动语音识别(ASR)进行人类语音识别性能,在声学模型(AM)和语言模型(LM)(LM)(LM):ASR1)训练中,录制了印度源头和LM的录音Timit Text,ASR2)我正在使用来自Libli语音语料库的本地美国扬声器和LM的录音,以及ASR3)正在使用来自美国原住民扬声器和LM构建的录音在Libli语音和Timit文本上。我们观察到HSR性能类似于ASR1的性能,而ASR3则实现最佳性能。扬声器诞生明智的分析表明,与少数其他生命神相比,印度听众的扬声器的话语更难以识别
translated by 谷歌翻译